Search CORE

4 research outputs found

Operating system support for overlapping-ISA heterogeneous multi-core architectures

Author: David Koufaty
Dheeraj Reddy
Paul Brett
Rob Knauerhase
Scott Hahn
Tong Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

A heterogeneous processor consists of cores that are asymmetric in performance and functionality. Such a de-sign provides a cost-effective solution for processor man-ufacturers to continuously improve both single-thread per-formance and multi-thread throughput. This design, how-ever, faces significant challenges in the operating system, which traditionally assumes only homogeneous hardware. This paper presents a comprehensive study of OS support for heterogeneous architectures in which cores have asym-metric performance and overlapping, but non-identical in-struction sets. Our algorithms allow applications to trans-parently execute and fairly share different types of cores. We have implemented these algorithms in the Linux 2.6.24 kernel and evaluated them on an actual heterogeneous plat-form. Evaluation results demonstrate that our designs effi-ciently manage heterogeneous hardware and enable signifi-cant performance improvements for a range of applications.

CiteSeerX

Crossref

An analysis of performance interference effects in virtual environments

Author: Calton Pu
Mic Bowman
Paul Brett
Rob Knauerhase
Younggyun Koh
Zhihua Wen
Publication venue
Publication date: 01/01/2007
Field of study

Virtualization is an essential technology in modern datacenters. Despite advantages such as security isolation, fault isolation, and environment isolation, current virtualization techniques do not provide effective performance isolation between virtual machines (VMs). Specifically, hidden contention for physical resources impacts performance differently in different workload configurations, causing significant variance in observed system throughput. To this end, characterizing workloads that generate performance interference is important in order to maximize overall utility. In this paper, we study the effects of performance interference by looking at system-level workload characteristics. In a physical host, we allocate two VMs, each of which runs a sample application chosen from a wide range of benchmark and real-world workloads. For each combination, we collect performance metrics and runtime characteristics using an instrumented Xen hypervisor. Through subsequent analysis of collected data, we identify clusters of applications that generate certain types of performance interference. Furthermore, we develop mathematical models to predict the performance of a new application from its workload characteristics. Our evaluation shows our techniques were able to predict performance with average error of approximately 5%. 1

CiteSeerX

Crossref

Runnemede: An Architecture for Ubiquitous High-Performance Computing

Author: Aditya Agrawal
Asit K. Mishra
Benoit Meister
Ganesh Venkatesh
Howard David
Ivan Ganev
Jianping Xu
Josep Torrellas
Joshua Fryman
Justin Teller
Nicholas P. Carter
Richard Lethin
Rob Knauerhase
Roger A. Golliver
Romain Cledat
Shekhar Borkar
Wilfred R. Pinfold
Publication venue
Publication date: 31/07/2013
Field of study

DARPA’s Ubiquitous High-Performance Computing (UHPC) program asked researchers to develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt, assuming 2018-era fabrication technologies. This paper describes Runnemede, the research architecture developed by the Intel-led UHPC team. Runnemede is being developed through a co-design process that considers the hardware, the runtime/OS, and applications simultaneously. Near-threshold voltage operation, fine-grained power and clock management, and separate execution units for runtime and application code are used to reduce energy consumption. Memory energy is minimized through application-managed on-chip memory and direct physical addressing. A hierarchical on-chip network reduces communication energy, and a codelet-based execution model supports extreme parallelism and fine-grained tasks. We present an initial evaluation of Runnemede that shows the design process for our on-chip network, demonstrates 2-4x improvements in memory energy from explicit control of on-chip memory, and illustrates the impact of hardware-software co-design on the energy consumption of a synthetic aperture radar algorithm on our architecture. 1

CiteSeerX

Crossref